{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "a050dfc0",
      "metadata": {
        "id": "a050dfc0"
      },
      "source": [
        "# Sesame CSM (1B) - Text to Speech"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7c2c61ec",
      "metadata": {
        "id": "7c2c61ec"
      },
      "source": [
        "This notebook showcases how to generate speech from text using the Sesame CSM 1B model. This is ideal for converting instructional or conversational content into natural audio output."
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/Sesame_CSM_1B_TTS.ipynb)\n"
      ],
      "metadata": {
        "id": "5SA94wTb9AH-"
      },
      "id": "5SA94wTb9AH-"
    },
    {
      "cell_type": "markdown",
      "id": "ab27fd8d",
      "metadata": {
        "id": "ab27fd8d"
      },
      "source": [
        "## 🧩 Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ac3a627f",
      "metadata": {
        "id": "ac3a627f"
      },
      "outputs": [],
      "source": [
        "!pip install -q transformers\n",
        "!pip install -q torchaudio\n",
        "!pip install -q soundfile"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a382e4d0",
      "metadata": {
        "id": "a382e4d0"
      },
      "source": [
        "## 🛠️ Tools\n",
        "- transformers for loading the model\n",
        "- torchaudio for audio processing\n",
        "- soundfile for playback"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7ff6eb6d",
      "metadata": {
        "id": "7ff6eb6d"
      },
      "source": [
        "## 🧾 YAML Prompt\n",
        "```yaml\n",
        "task: \"Text to Speech\"\n",
        "style: \"Clear, educational\"\n",
        "language: \"en\"\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0ab9cda8",
      "metadata": {
        "id": "0ab9cda8"
      },
      "source": [
        "## 🧠 Main"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6459f8bd",
      "metadata": {
        "id": "6459f8bd"
      },
      "outputs": [],
      "source": [
        "from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq\n",
        "import torchaudio\n",
        "import soundfile as sf\n",
        "import torch\n",
        "\n",
        "processor = AutoProcessor.from_pretrained(\"m-a-p/Sesame-CM\")\n",
        "model = AutoModelForSpeechSeq2Seq.from_pretrained(\"m-a-p/Sesame-CM\")\n",
        "model = model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
        "\n",
        "text = \"Welcome to the world of voice synthesis using open models!\"\n",
        "inputs = processor(text, return_tensors=\"pt\").to(model.device)\n",
        "with torch.no_grad():\n",
        "    outputs = model.generate(**inputs)\n",
        "speech = processor.batch_decode(outputs, return_tensors=\"pt\")[0]\n",
        "\n",
        "sf.write(\"output.wav\", speech.numpy(), 16000)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "376fe112",
      "metadata": {
        "id": "376fe112"
      },
      "source": [
        "## 📤 Output\n",
        "🖼️ Output Preview (Text Summary):\n",
        "\n",
        "Prompt: A clear educational text is converted to a .wav file.\n",
        "\n",
        "🎧 The output audio will say: 'Welcome to the world of voice synthesis using open models!'\n",
        "This demonstrates how Sesame-CM can be used for building TTS applications easily."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}